Space Lower Bounds for Online Pattern Matching 

Raphael Clifford,* Markus Jalsenius,* Ely Porat,^ Benjamin Sach* 



Abstract 

We present space lower bounds for online pattern matching under a number of different 
distance measures. Given a pattern of length m and a text that arrives one character at a 
time, the online pattern matching problem is to report the distance between the pattern 
and a sliding window of the text as soon as the new character arrives. We require that 
the correct answer is given at each position with constant probability. We give Sl(m) bit 
£SJ ' space lower bounds for Li, L2, L x , Hamming, edit and swap distances as well as for any 

algorithm that computes the cross-correlation/convolution. We then show a dichotomy 
between distance functions that have wildcard- like properties and those that do not. In 
the former case which includes, as an example, pattern matching with character classes, 
we give Q,(m) bit space lower bounds. For other distance functions, we show that there 
exist space bounds of fi(logm) and 0(log 2 m) bits. Finally we discuss space lower bounds 
for non-binary inputs and show how in some cases they can be improved. 
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1 Introduction 



on 

q 

^ ' We combine existing results with new observations to present an overview of space lower bounds 

for online pattern matching. Given a pattern that is provided in advance and a text that arrives 
one character at a time, the online pattern matching problem is to report the distance between 
the pattern and a sliding window of the text as soon as the new character arrives. In this 
£SJ ' formulation, the pattern is processed before the first text character arrives and once processed, 

the pattern is no longer available to the algorithm unless a copy is explicitly made. 

This problem has recently gained a great deal of interest with breakthrough results given 
for exact matching and pattern matching under bounded Hamming distance (fc- mismatch) [13] . 
For both problems it was shown that space sublinear in the size of the pattern is sufficient to 
give the correct answer at every alignment with high probability. These remarkable results 
immediately raise a number of significant unresolved questions. The first is for which other 
distance measures between strings might sublinear space randomised online algorithms be 
achievable and it is this question which we address here. 
. Our presentation is divided between what we term local and non-local online pattern match- 

ing problems. In the former case the distance function between a pattern P of length m and 
an m-length substring of the text T, starting at position i, is defined by 

tn— 1 

LocALPM (e , A) (P,T) = ®A(P\j],T[i + j}), 

3=0 

where © and A are both binary operators. In Section [4] we show f2(m) bit space lower bounds 
for online pattern matching for the local problems of L\ 1 L2, and Hamming distance as well 
as for any algorithm that computes the cross-correlation/convolution. 

We then go on to show in Section [5] a space dichotomy for local online pattern matching 
problems of the form d(i) = Ajl^ 1 &(P\j]iT[i + j]) where the range of A is {True, False}. 
Where the distance function A has wildcard-like properties (qv. Section [5]), we give an fi(m) 
space lower bound. Where it does not, we have r2(logm) and 0(log 2 m) space bounds. This 
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Figure 1: An example of A such that LocalPM( A ^) is invalid (either text or pattern inde- 
pendent with respect to any pattern P). 

implies, for example, that online pattern matching with character classes [8] requires linear 
space. 

In Section [S] we go on to consider all eight possible binary Boolean associative operators 
and give a complete classification in terms of their known upper and lower space bounds. One 
consequence is that determining if there is an exact "non-match" , where the Hamming distance 
is the same as the pattern length, requires linear space in our online model. This bound also 
holds if, for example, only the parity of the Hamming distance is required. In Section [7] we 
then show how our techniques can be used to give linear space lower bounds for L M online 
pattern matching. In Section [5] we discuss a possible approach to space lower bounds for inputs 
with large alphabets, focussing on the Hamming distance problem. Finally, in Section |H] we 
explore non-local problems and show f2(ra) bit space lower bounds for both online edit and 
swap distance. 

2 Preliminaries and related work 

Let Ep and St denote the pattern and text alphabet, respectively. We say that LocalPM(0 jA ) 
is text independent with respect to the pattern P if the value of LocalPM( 0;A ) is a constant 
independent of T. We say that LocalPM^a.) is pattern independent with respect to a pattern 
P if there is a function A' such that A(x, y) = A'(y) for all (x, y) G P x Ex- 

Example 1. Let Ep = {x,y, z}, Et = {a,b,c}, © be the Boolean AND-operator and A be 
defined according to the table in Figure^ where 1 is True and is False. We can see that 
LocalPM( A a) is text independent with respect to the pattern P = xxyyxzxx as it always 
outputs 0. It is also pattern independent with respect to P — yyzyyzzy as A(y,a) = A(z,a) 
for all a 6 Et- In fact, for this particular definition of A, LocalPM( A a) is either text or 
pattern independent with respect to any pattern P. 

Suppose that LocalPM^a.) is text independent with respect to a pattern P. Then 
any algorithm for LocalPM^a.) on P requires at most 0(1) space after preprocessing P. 
If LoCALPM( ffli A) is pattern independent with respect to P then LocalPM^a.) does not 
depend on the pattern and is outside the scope of this paper. 

We say that LoCALPM( ffijA .) is invalid if, for every pattern P, it is either text or pattern 
independent with respect to P. LocalPM^a) is valid if it is not invalid. The problem 
LocalPM( A)A ) in the previous example is therefore invalid. We will only consider from this 
point pattern matching problems LocalPM^a.) which are valid, and ignore patterns for 
which LocalPM(0 A ) is pattern or text independent. 

Our focus is on online pattern matching algorithms which output correct answers with 
constant probability. We are not aware of previous work that considers randomised lower 
bounds for this specific type of problem. There is however now a considerable literature on 
communication complexity and on streaming algorithms for single input streams, including 
those that process a sliding window of the input (see e.g. [1])). This previous streaming work has 
typically focussed on deterministic or randomised bounds for finding approximate rather than 
exact solutions. Quantum lower and classical upper bounds for the communication complexity 
of Hamming distance in more general models than we consider were given previously [S]. A 
linear lower bound for the randomised communication complexity of the inner product of two 
binary vectors is given in [3] . The dichotomy presented in Section [3] and in particular the 
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concept of a matching relation that includes wildcard matching, although in a different setting 
and with different terminology, is similar to a time complexity dichotomy given previously by 
Muthukrishnan and Ramesh [9] . On the topic of swap matching in Section [9j we note that 
in PP, the existence of a reduction for time rather than space, from Boolean convolutions to 
string matching with swaps is claimed without proof. 



3 Communication complexity problems 

Our results are based on reductions from various one-way randomised communication com- 
plexity problems with known lower bounds. We list the relevant problems below. In a one-way 
randomised communication model, only Alice can send messages to Bob and Bob must output 
the correct answer with probability at least 2/3. Note that the value 2/3 is inconsequential: 
any probability strictly greater than 1/2 can be amplified to a constant arbitrary close to 1. 
We assume private randomness. 

Definition 2. The Equality problem in one-way communication complexity is defined as 
follows. Alice has a string X G {0, l} m and Bob has a string Y G {0, l} m . Bob must determine 
whether X = Y. The communication complexity is O (log m) bits [14] . 

Definition 3. The Indexing problem in one-way communication complexity is defined as 
follows. Alice has a string X G {0, l} m and Bob has an index n G {0, . . . m — 1}. Bob must 
find X[n]. The problem is known to have an f2(m) bit lower bound (see [5] for an elementary 
proof). 



4 Addition 

In this section we consider the problem LocalPM(_|_ i a), where + is standard addition and the 
range of A is a subset of the integers. That is, the distance function is 

m— 1 

d(i) = J2 A (P\JW+j})- 

3=0 

Theorem 4. LocalPM( + a) requires f2(m) bits of space. 

Proof. Since LocalPM( + ^) is n ot text independent, there must exist characters x £ Sp 
and a, b G Et such that A(x, a) ^ A(x,b). We reduce from Indexing: Alice has a string 
T = {a, b} m and Bob has an index n. Alice initialises a pattern matching algorithm A on the 
pattern P — {x} m and feeds in her string T. Then she sends the internal state of A to Bob, 
who feeds in n copies of the symbol a. Let d be the output after those as. Bob then feeds 
in another a. Let d 1 be the output. If d = d 1 then A[n] = a. If d ^ d' then A[n] = b. If 
the probability of error per output is bounded by a constant c < 1/4, then the union bound 
for error on two outputs is 2c, giving the Indexing problem an error probability of at most 
2c < 1/2. □ 

Corollary 5. Computing the L\, Li and Hamming distances, as well as the convolution, 
require Q(m) bits of space. 



5 Conjunction 

In this section we consider LocalPM( A A ), where A is the Boolean AND-operator and the 
range of A is {0, 1} (where denotes False and 1 denotes True). There are several natural 
pattern matching problems that fall under this category, for example, exact matching, matching 
with wildcards and exact matching with character classes. 
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Figure 2: The wildcard matrix (left) and 
negated wildcard matrix (right). 



Figure 3: A in the proof of 
Theorem [51 



The function A can be represented with a 0/1-matrix Ma, where the rows and columns 
correspond to the symbols in Sp and St, respectively. Thus, the entry = A(i,j). The 
2x2 matrix in Figure [5] will play an important role, and we call it the wildcard matrix. 

We say that Ma contains the wildcard matrix if it is a submatrix of Ma under some 
permutation of the rows and columns. 

We demonstrate the following dichotomy for LocalPM( A a)- If Ma contains the wildcard 
matrix, then LocalPM( A! a) is solvable in 9(m) bits of space, otherwise it is solvable in 6(1) 
bits of space. The first class is equivalent to pattern matching with wildcards, and the second 
class is equivalent to exact matching. Note that both dichotomies are decidable due to the 
simple characteristic of the function A. 

Theorem 6. //Ma contains the wildcard matrix, then LocalPM( A a) requires fl(m) bits of 
space. 

Proof. Suppose that *, x £ Sp (* represents a wildcard symbol) and a,b £ St such that A is 
specified according to Figure |3l We reduce from the Indexing problem, in which Alice has an 
m-length bit string X £ {*, x} rn and Bob has an index n £ {0, ... m — 1}. Let the pattern P 
be the string X. Let A be any algorithm that solves LocalPM( A a) on the pattern P. Alice 
sends the internal state of A to Bob, who feeds the algorithm with the m-length string that 
has the symbol a at every position except for at position n where the symbol is b. The output 
is True iff X[n] = *. □ 

The following lemma will be useful for the next two theorems (see Figure [4]). 

Lemma 7. Let M' A be the matrix obtained from A7a by first removing copies of identical rows 
and columns, keeping only rows and columns that are distinct in Ma, and then removing any 
row or column that contains only zeros. If Ma does not contain the wildcard matrix, then M' A 
is the identity matrix, under some permutation of rows and columns. 

Proof. Suppose that Ma does not contain the wildcard matrix. Let M' A be obtained from Ma 
according to the statement of the lemma. We will show that every column and every row of 
M' A contains exactly one 1. 

First we show that every row of M' A must contain at least one 1. Suppose that some row 
r of M' A contains only 0s. Since zero-rows of Ma were removed and one copy of each column 
remains after the removal process, it is not possible that all columns in which row r is 1 were 
removed. We now show that M' A cannot contain a row r with two or more Is. Without loss of 
generality, assume that there is a 1 in columns i and j of row r. Since Ma does not contain a 
wildcard matrix, the elements of columns i and j must both be either or 1 in every row. Thus, 
columns i and j are identical, and one of them must have been removed, contradicting the fact 
that there are two Is in row r of M' A . In order to show that every column of M' A contains 
exactly one 1, we use the exact same argument as for the rows. Thus, M' A is the identity 
matrix, under some permutation of rows and columns. (See Figure HI for an illustration of the 
lemma) . □ 

Theorem 8. If Ma does not contain the wildcard matrix, then LocalPM( A a) requires 
f2(logm) bits of space. 
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Figure 4: An illustration of Lemma [7] 



Proof. We reduce from the Equality problem, where Alice has a string X S {0, l} m and Bob 
has a bit string Y g {0, l} m . Since Ma doesn't contain the wildcard matrix and as we only 
consider problems LocalPM( A ^) that are valid, it follows from Lemma [7] that there must 
exist x, y £ Ep and a, b £ Et such that A is according to Figure [5] Let P be the m-lcngth 
pattern obtained from X by replacing every with x and every 1 with y. The m-length text T 
is obtained similarly from Y by replacing every with a and every 1 with b. For any algorithm 
A that solves LocalPM( A ^) on the pattern P, Alice sends the internal state of A on pattern 
P to Bob, who feeds A with T. The output is True iff X = Y. □ 

Theorem 9. If does not contain the wildcard matrix, then LocalPM( A a) can be solved 
in 0(log 2 to) bits of space. 

Proof. We will describe an algorithm for solving LocalPM( A a) which uses the exact matching 
algorithm by Porat and Porat [13], which runs in space O(logTO-) words, which is 0(log 2 to) 
bits of space (under the word- RAM model). In order to use the exact matching algorithm (as 
a "black box" ) we must ensure that we do not feed it with distinct symbols that are identical 
under A. In other words, we can think of A specifying character classes, and for each class we 
want to use one representative symbol. We formalise this below. 

We make the very reasonable assumption that the alphabets Ep and Et are both enumer- 
able and that we can iterate through every symbol of Ep and Et, respectively, in no more 
than O(logm) bits of space. Let the order by which we iterate through the alphabets describe 
an ordering of the symbols in Ep and Et- We say that the symbol x £ Ep is smaller than 
y £ Ep if x appears before y when iterating through Ep. We use the same notation for the 
symbols of Et- We say that two symbols x,y £ Ep are equivalent if A(x, a) — A(y, a) for all 
a £ Et- Similarly, a,b £ Et are equivalent if A(x,a) = A(x,b) for all x £ Ep. Wc define the 
smallest equivalent symbol of x £ Ep to be the symbol y £ Ep such that y is equivalent to 
x and no other symbol equivalent to x is smaller than y. The notion of smallest equivalent 
symbol is defined similarly on Et- 

Let E p C Ep be the set of all symbols x £ Ep such that the smallest equivalent symbol 
of x is x itself. We do not include any symbol x in E p such that A(x, a) = for all a £ Et- 
Similarly, let E T C Et be the set of all symbols a £ Et such that the smallest equivalent 
symbol of a is a itself. We do not include any symbol a in E T such that A(x, a) = for all 
x £ Ep . By Lemma [7] we have that A on E P and S T is represented by an identity matrix 
under some permutation of the rows and columns. In the example of Figure [H E p = {x,v} 
and E T = {a, b}. We will ensure that we use the exact matching algorithm of [T3] only on E p 
and E T (i.e., normal exact pattern matching). 

Given a symbol x £ Ep, we can find its smallest equivalent symbol by iterating through 
every symbol y £ Ep and for each y, we iterate through all a £ Et to check whether A(x, a) = 
A(y, o). Similarly we can find the smallest equivalent symbol of any symbol in Et- 

Let P be the pattern. We may assume that P does not contain a symbol x for which 
A(x, a) = for all a € Et- If it does, the output is always 0. Before we preprocess the pattern, 
we replace every symbol with its smallest equivalent symbol. Then we preprocess the pattern 
using the fingerprint technique described in |13j . Now we run the exact matching algorithm 
with the following additional step. When a new symbol a arrives, we replace it with its smallest 
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Figure 5: A in the proof of Theorem |5J 



equivalent symbol. The only caveat we must take care of is the situation when A(x, a) = for 
all i£Sp. We can detect this case by iterating through the symbols of Sp. As long as a is 
present in the last m characters of the stream, the output is zero. We use a flag to keep track 



We now show how these results can be applied to a specific pattern matching problem that 
has not been considered in the online setting before. The pattern matching with character 
classes problem allows a set of characters to be defined for each position in the pattern [8]. 
A character in the text matches a set at a pattern position if it is contained within it. This 
is a generalisation of exact matching where each set would contain only one character. Using 
Theorems [SJ HI and [5] we can determine precisely when this problem can and cannot be solved 
online in sublinear space. 

Corollary 10. Online pattern matching with character classes requires Q(m) bits of space in 
the worst case. However, where the character classes define a matching relation A which does 
not contain the wildcard matrix (see the example in Figure^), 0(log 2 to) bits suffice. 

6 Other Boolean operators 

In the previous section we demonstrated a dichotomy for LocalPM(0 jA ), where © is the 
AND-operator. Here we will complete the classification of Boolean operators. There are eight 
associative Boolean operators a © b: 

1. True 2. False 3. a 4.6 5. a A b 6. aVfe 7. a = b 8. a ^ b 

The operators True and False are trivial; the output is either always True or False. 
The operator a © b = b is also easy; the output is always A(P[m — 1], t), where t is the last 
received symbol of the text stream. 

The operator a(Bb — a is on the other hand more demanding. Here the output is A(P[0], i), 
where t is the TOth last symbol received from the text stream. The pattern matching algorithm 
must therefore remember m received characters of the stream. More precisely, we see that 
0(m) bits of space is necessary by reducing from the Indexing problem: Alice first feeds 
her array (text) into the pattern matching algorithm, for which P[0] is a character that can 
distinguish between the characters of Alice's array. She then sends the internal state to Bob, 
who feeds in n symbols in order to determine the value at index n of Alice's array. 

The OR-operator V is equivalent to A under De Morgan's laws: negate the outputs from 
A and negate the output from the pattern matching algorithm. Thus, the dichotomy for A 
applies to V as well, only that we characterise the classes with the wildcard matrix in which 
each element has been negated. This is called the negated wildcard matrix (see Figure [2|) . 

We now show that the equality operator "=" requires Q(m) bits of space. First note that 
the output from the pattern matching algorithm is if and only if A([P[j], T[i + j]) = for an 
odd number of positions j. For example, if Ma is the identity matrix, LocalPM( = a) gives 
us the parity of the Hamming distance. 

Since LocalPM( = a) is valid, there are x £ Sp, a,b <E St such that A(x,a) — and 
A(x,b) = 1. We reduce from the Indexing problem, where Alice has a string in {a, b} m and 
Bob has an index n. Alice initialises a pattern matching algorithm on the pattern P = {x} m 
and feeds it with her string. She sends the internal state to Bob, who feeds the algorithm with 
n copies of the symbol a. The first position of P is now aligned with the nth character of 
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Figure 6: A and A' in the proof of Theorem [TT1 



Alice's string. Suppose the output from the algorithm is d. Bob now feeds in another a. Let 
d! be the new output. If d = d' then the character at position n of Alice's string must have 
been a. If d ^ d' then the character must have been b. 

The operator is similar to "=" and also requires Q(m) bits of space. To see this, note 
that the output from the pattern matching algorithm is if and only if A([P[j], T[i + j]) = 1 
for an even number of positions j. We may therefore prove the lower bound using a reduction 
from the Indexing problem similar to above. 

7 The Lqo distance 

In this section we consider the distance problem which can be defined as LoCALPM( max A), 
where A(x, y) = \x — y\ and max(a, b) is the maximum of a and b. In this section we assume 
that the pattern and text are integer valued. Here the distance function is the maximum 
A(P[j],T[i + j}) over all j, that is 



Theorem 11. The L m distance problem requires n(m) bits of space. 

Proof. Let Sp = {0, 1} and St = {2,3}. Therefore A is specified according to Figure [5] Let 
A'(x, y) = 1 if A(a;, y) < 3, otherwise A'(x, y) = 0. Therefore Ma' contains the wildcard matrix 
and hence by Theorem [BJ LocalPM( A a') requires fl(m) space. 

Let d'(i) be the distance under LocalPM( A . A ') ■ If d'(i) = 1 then for all j, A'(P[j],T[i + 
j}) = 1, implying that A(P\j],T[i + j}) < 3 for all j. Hence d(i) < 3. If d'(i) = then 
there exists a, j such that A'(P[j], T[i + j]) = 0, implying that A(P[j],T[i + j]) = 3 and hence 
d(i) — 3. Therefore, if we can solve LoCALPM( max A), we can solve LocalPM( A A ') ■ d 

8 Non-binary alphabets 

The space lower bounds we have given so far have been either 51(logm) or Q(m) bits. When 
the pattern or text alphabet is drawn from a large universe, the question arises as to whether 
even more space is required to perform online pattern matching. We show by way of another 
different reduction a method that may be applicable to a wider range of pattern matching 
problems than we consider here. Our approach is to show a reduction from the communication 
complexity problem DlSJOlNTNESS [7] to the Hamming distance problem. In Disjointness 
Alice and Bob both have sets of m elements each chosen from a universe of size U and Bob 
wants to determine if their intersection is empty. The lower bound for the space complexity 
of the Hamming distance problem will then be determined by lower bounds for the one-way 
randomised communication complexity of the Disjointness problem with private coins. A 
result regarded as folklore shows that this complexity is VL{m\ogm + log log U) when U is 
f2(m 1+£ ) [111 112) . This in turn implies a superlinear lower bound for the space complexity of 
the online Hamming distance problem with large alphabets. 

For an integer n, we write [n] to denote the set {0, . . . , n — 1}. Alice has a set A C [U] 
and Bob has a set B C [U], and \A\ = \B\ = m. The reduction performs the following steps. 
We assume for the moment that Alice and Bob both have a shared source of randomness and 
show later how this assumption can be removed. 



d(i) 



max 
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A(P\j],T[i + j]). 



7 



1. Alice creates a pairwise independent hash function h : [U] — > [cm], for some constant 
integer c > 1 and creates a pattern P of length cm where each element is initialised to be 
some unique symbol $ ^ [U]. She then sets P[h(x)] = x for all x £ A by going through 
A in some arbitrary order. If a position of P is written to multiple times, only the last 
write is stored. 

2. Alice starts the Hamming distance algorithm up until the point at which it has processed 
the pattern P but none of the text (which is created later) and sends the internal state 
of the algorithm to Bob. 

3. Bob performs the same hashing operation using the same hash function but this time on 
set B, creating a text T of length cm. Bob uses a different unique symbol $' ^ [U] for 
the initialisation of the text. 

4. Bob feeds the Hamming distance algorithm with the whole text T . Bob concludes that 
A and B are disjoint iff the output is cm. 

Theorem 12. Any randomised algorithm for Hamming distance where the symbols are chosen 
from a universe of size £l(m 1+£ ) uses f2(mlogm + log log t/) bits of space. 

Proof. Considering the reduction above, if A and B are disjoint, then a deterministic Hamming 
distance algorithm will always output cm. If A and B are not disjoint then a necessary condition 
for a deterministic Hamming distance algorithm to output cm is if at least two elements are 
hashed to the same location by either Alice or Bob. We can see that the probability of 
incorrectly outputting cm is maximised when A and B share exactly one element. Therefore, 
suppose that A n B = {x}. The element x is hashed to position h(x). By the union bound 
and the pairwise independence of the hash function, the probability that some other element 
in either A or B is mapped to h(x) is at most l/(cm) • m • 2 — 2/c. If we assume our 
randomised Hamming distance algorithm is correct with probability at least 2/3, then the 
overall process falsely reports disjointness with probability at most 2/c+ 1/3 (union bound). 
The space complexity of Hamming distance is therefore lower bounded by the communication 
complexity of the disjointness problem if Alice and Bob have a shared source of random bits 
to select their common hash function. By Newman's Theorem |10j the cost of transforming 
the protocol to work with only private coins is at most an additive 0(loglog£7) factor in the 
asymptotic complexity. Assuming that U grows polynomially in m and so log log U is 0(log m) , 
the overall lower bound for the space complexity of the Hamming distance problem is therefore 
f2(m log m — log m) = f2(m log m) . To finish the proof for larger U, we observe first that a lower 
bound for smaller universes must still hold for larger ones. The final additive f2(loglog U) term 
is derived by simply setting m = 1 and follows directly from the randomised lower bound for 
Equality. Therefore the overall lower bound is f2(mlogm + log log U) as required. □ 

9 Non-local pattern matching 

So far we have focused only on local pattern matching where each position in the alignment 
contributes to the distance independently of the other positions. Here we take a brief look at 
space lower bounds for two non-local distance measures: edit distance and swap matching. 

In online pattern matching, we define the edit distance as the minimum number of single 
character edit operations (insert, delete and replace) required to transform P into the last m 
characters of the streamed text. This implies that the number of insertions and deletions are 
equal. 

We show that for binary Ep = St = {0, 1}, the online edit distance problem requires 
f2(m) bits of space. For non-binary inputs there is a reduction from the Hamming distance 
problem [5] . The reduction we give covers the binary alphabet case as well and follows directly 
from Indexing, where Alice has a string P E {0, l} m and Bob has an index n. Alice initialises 
a pattern matching algorithm on the pattern P and sends the internal state to Bob, who first 
feeds in m zeros. Let d be the output and note that d is the number of ones in P. Bob then 
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Figure 7: A and alignments under swaps. 



feeds in the m-length string that consists of zeros at every position except for at position n 
where it is one. Let d! be the output. Bob can now decide the value of P[n] by comparing d 
with d!: P[n] — I if d 1 < d, and P[n) — if d! d. The probability of error is therefore upper 
bounded by the union bound on d and d! being wrong. 

Given a string S, a swap at position i means that the characters S[i] and S[i + 1] swap 
positions. We say there is a swap match if and only if the pattern P can be transformed into 
the last m characters of the streamed text through a set of swaps. Each S[i] is swapped at 
most once. 

We show that the online swap distance problem requires f2(m) space. Our proof is based 
on the techniques we have presented in this paper. Specifically, we demonstrate a reduction 
from LocalPM( A a) where Ma contains the wildcard matrix, hence the space lower bound 
is Q(m). Suppose we have A as in Figure [7] Let P 6 {*, x} rn and St = {a, b). From P we 
obtain P 1 E {0, l} 5m such that every * in P is replaced with 00100 and every x is replaced 
with 00010. When we receive characters from the text, we replace a with 00010 and b with 
01000. It follows, under the transformation of the symbols, that there is a swap match if and 
only if LocalPM( A a) outputs True for the original (non-transformed) strings. To see this, 
note that both a and b, under the transformation, swap match but b does not swap match 
x (see Figure [7]). The transformation of the symbols does not allow swaps between adjacent 
characters; every possible swap will take place "within" the binary encoding of a symbol. Thus, 
a swap match directly corresponds to a match under LocalPM( A a)- 

10 Open problems 

We have considered space lower bounds and discussed how they can be derived from known 
communication complexity lower bounds. Upper bounds can also be directly derived from 
existing online pattern matching algorithms. For all the problems we have discussed there is 
at most a log factor gap between these upper and lower bounds. However, where the known 
lower bound is sublinear, as is the case for exact matching for example, this gap may still be 
considered significant. Further, for bounded Hamming distance where the distance is only to 
be given if it is at most some constant k, the best known randomised online space upper bound 
is 0(/c 3 polylog to) [IS])- The best known lower bound, on the other hand, is very different 
at Q(fc) [5]. Further, it is known that the lower bounds can not be increased to match the 
known upper bounds using the one-way communication complexity of the functions between 
two strings of the same length. Either more space efficient algorithms exist for these problems 
or novel techniques will be needed to improve the lower bounds. 
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